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Response to Notification of Non-Compliant Appeal Brief 
[0001] Applicant hereby authorizes the Commissioner to charge any deficiency of 
fees and credit any overpayments to Deposit Account Number 12-0769. 
[0002] Applicant submits herein a corrected Claims Appendix for the Appeal Brief. 
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the amendment be entered. In the previous amendment filed September 28, 2007, the 
amendment to claim 85 contained an error in that the reference to claim 61, from which 
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replaces that reference. 

[0003] Accordingly, Applicant also submits herein a corrected section IV of the 
Appeal Brief to provide the correct status of amendments. 
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Claim Amendments 

1 . (Previously Presented) A data analysis system, comprising: 

a first component associated with a server of the data analysis system that 
facilitates generation of a first data set related to web page information obtained via a 
communication system; and 

a second component that coordinates a second data set relating to web page 
information from at least one distributed resource associated with at least a client of the 
server which interacts with the communication system; the second data set is utilized to 
refine the first data set, wherein refining the first data set comprises adding unknown 
information to the first data set when new information is received from the distributed 
source via the second data set or updating existing information in the first data set when 
changes have occurred in the contents of the web page information as indicated by the 
second data set. 

2. (Original) The system of claim 1, the first component comprising an internet 
web crawler. 

3. (Original) The system of claim 1, the first component comprising an intranet 
web crawler. 

4. (Original) The system of claim 1, the second component further utilized to 
optimize reception of data from the distributed resources. 
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5. (Original) The system of claim 1 , the second component provides a scheduling 
function to control reception of the second data set from the at least one distributed 
resource. 

6. (Original) The system of claim 1, the second component utilized to facilitate 
communication traffic reduction via the communication system by employing a proper 
set of weak indicator functions representative of the first data set. 

7. (Original) The system of claim 6, the second component further utilized to 
randomly select and transmit a weak indicator function selected from the proper set of 
weak indicator functions to at least one of the distributed resources. 

8. (Original) The system of claim 1, the second component further utilized to 
compare the first data set and the second data set to detect spoof data retrieved by the 
first component. 

9. (Original) The system of claim 1, the second component further utilized to 
generate status information about data related to the first data set; the status 
information transmitted to at least one distributed resource. 

10. (Original) The system of claim 9, the status information comprising, at least in 
part, a freshness flag to indicate freshness of information related to the first data set. 
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1 1 . (Original) The system of claim 9, the status information comprising, at least in 
part, a hash of contents of information related to the first data set. 

12. (Original) The system of claim 9, the status information comprising, at least in 
part, a copy of information of the first data set. 

13. (Original) The system of claim 1, the communication system comprising an 
internet. 

14. (Original) The system of claim 1, the communication system comprising a 
world wide web. 

15. (Original) The system of claim 1, the communication system comprising an 
intranet. 

16. (Original) The system of claim 15, the intranet comprising a local area 
network. 

17. (Original) The system of claim 15, the intranet comprising a wide area 
network. 

18. (Canceled) 
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19. (Original) The system of claim 1, the distributed resources comprising trusted 
entities interactive with the communication system and the second component. 

20. (Original) The system of claim 1, the first data set comprising internet web 
page data. 

21. (Original) The system of claim 1, the first data set comprising intranet web 
page data. 

22. (Canceled) 

23. (Original) The system of claim 1, the second data set comprising, at least in 
part, a hash of contents of at least one web page. 

24. (Original) The system of claim 1, the second data set comprising, at least in 
part, a Uniform Resource Locator (URL) of at least one web page. 

25. (Original) The system of claim 1, the second data set comprising, at least in 
part, a time stamp relating to an acquisition time for information about at least one web 
page. 
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26. (Previously Presented) The system of claim 1, the second data set 
comprising, at least in part, a delta indication of the changes to contents of the at least 
one web page. 

27. (Original) The system of claim 26, the delta indication including, at least in 
part, a hash of previous contents of a web page and a hash of recent contents of the 
web page. 

28. (Original) The system of claim 1, the second data set comprising, at least in 
part, a status indication of changes to contents of at least one web page. 

29. (Original) The system of claim 28, the status indication including, at least in 
part, a percentage relating to an amount of change of contents of a web page. 

30. (Original) The system of claim 28, the status indication including, at least in 
part, a significance indicator to signify importance of changes in contents of a web page. 

31 . (Original) The system of claim 1 , the second data set comprising internet web 
page data. 

32. (Original) The system of claim 1 , the second data set comprising intranet web 
page data. 
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33. (Original) The system of claim 1, the second data set comprising data 
compiled utilizing at least one weak indicator function randomly selected from a set of 
weak indicator functions; the set of weak indicator functions representative of the first 
data set. 

34. (Original) The system of claim 1, further comprising a search component to 
accept at least one search query and generate at least one search reply having at least 
a portion of the first data set represented by information embedded in the search reply. 

35. (Original) The system of claim 1, further comprising a web page server 
component to construct web pages having at least a portion of the first data set 
represented by information embedded in at least one link found on at least one 
constructed web page. 

36. (Original) The system of claim 1, further comprising a storage component to 
store the first data set. 

37. (Previously Presented) A method for facilitating data analysis, comprising: 
generating a first data set relating to a second data set obtained from web pages 

interactive with a server of a communication system; 

receiving a third data set from at least one distributed resource comprising a 
client of the server that is interactive with the communication system; the third data set 
comprising web page related information generated by the distributed resource; and 



Serial No.: 10/670,681 

Atty Docket No.: MS1 -3984US 

Atty/Agent: Kayla D. Brant 



-7- 



Tne 8 jsmyss of IP 



refining the second data set to reflect information obtained from the third data 
set, by: 

adding unknown information to the second data set when new information is 
received from the distributed source via the third data set; 

updating existing information in the second data set when changes have 
occurred as indicated by the third data set; and 

passing status information to the distributed resource through one or more 
indicators after information from the third data set has been analyzed. 

38. (Original) The method of claim 37, the first data set comprising a 
representation of the second data set. 

39. (Original) The method of claim 38, the representation of the second data set 
comprising, at least in part, a hash of contents of at least one web page contained in the 
second data set. 

40. (Original) The method of claim 38, the representation of the second data set 
comprising, at least in part, a status indication of at least one web page contained in the 
second data set. 

41. (Original) The method of claim 40, the status indication comprising a 
freshness flag to indicate if the web page information is current. 
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42. (Original) The method of claim 37, the first data set comprising a copy of the 
second data set. 

43. (Original) The method of claim 37, the second data set comprising web page 
information compiled by a web crawler. 

44. (Original) The method of claim 37, the third data set comprising web page 
information based upon client accessed web page information on the communication 
system. 

45. (Canceled) 

46. (Original) The method of claim 37, the communication system comprising an 
internet. 

47. (Original) The method of claim 37, the communication system comprising an 
intranet. 

48. (Canceled) 

49. (Original) The method of claim 37, further including: 
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transmitting the first data set to at least one distributed resource that is interactive 
with the communication system making the first data set available to be utilized by the 
distributed resource to generate the third data set. 

50. (Original) The method of claim 38, further including: 

generating a set of weak indicator functions to represent the second data set; 
and selecting random weak indicator functions from the set of weak indicator functions 
to transmit to the distributed resources as the first data set. 

51. (Original) The method of claim 50, the set of weak indicator functions 
comprising a proper set of weak indicator functions such that a non-zero probability 
exists that a randomly selected weak indicator function can identify a new web page. 

52. (Original) The method of claim 50, generating a set of weak indicator 
functions comprising: 

providing a dictionary representative of the second data set; 
partitioning randomly the dictionary into non-overlapping subdictionaries; and 
creating a function where l(x) = 1 if and only if at least one subdictionary's weak 
indicator function is equal to one. 

53. (Original) The method of claim 37, further including: 

comparing the third data set to the second data set to reveal spoof data included 
in the second data set. 
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54. (Original) The method of claim 37, further including: 

optimizing reception of at least one third data set through scheduling of the 
distributed resources. 

55. (Original) The method of claim 37, further including: 

receiving a web page search query from at least one distributed resource; 

generating a web search results page in response to the web page search query 
from the distributed resource; 

embedding portions of the first data set in links found on the web search results 
page; and 

transmitting the web search results page as a representation of at least a portion 
of the second data set to the distributed resource. 

56. (Original) The method of claim 37, further including: 

constructing a web page utilizing at least a portion of the first data set to embed 
information about links found in the web page; and 

transmitting the web page to disseminate the first data set to at least one 
distributed resource. 

57. (Previously Presented) A data analysis system, comprising: 

means for generating at least one first data set from a server of communication 
system; 
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means for receiving and coordinating at least one second data set from at least 
one client which interacts with the server of the communication system; and 

means for refining the first data set utilizing at least one second data set, wherein 
refining the first data set comprises the at least one of adding unknown information to 
the first data set when new information is received from the client via the second data 
set and updating existing information in the first data set when changes have occurred 
in the web page as indicated by the second data set. 

58. (Original) The system of claim 57, the means for generating at least one first 
data set including a web crawler. 

59. (Original) The system of claim 58, the first data set comprising data relating 
to web pages obtained by the web crawler. 

60. (Previously Presented) The system of claim 57, the second data set 
comprising web page comparison data compiled by the at least one client and based, at 
least in part, upon representative data of the first data set. 

61 . (Previously Presented) A data analysis system, comprising: 

a first component associated with at least one client of a distributed web crawling 
system that generates web page information from at least one visited web site for 
utilization in the distributed web crawling system; and 
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a second component associated with a server that receives the web page 
information transmitted by the first component via a communication system, wherein the 
first component receives a set of data from the second component to utilize in the 
generation of the web page information comprising at least comparison data based on 
the visited web page and the received set of data. 

62. (Original) The system of claim 61, the first component providing at least one 
time stamp relevant to a time of acquisition of data utilized in the generation of the web 
page information. 

63. (Original) The system of claim 61, the first component receiving a set of 
embedded web crawler data from at least one search result page to utilize in the 
generation of the web page information. 

64. (Original) The system of claim 61, the first component receiving a set of 
embedded web crawler data from at least one web page to utilize in the generation of 
the web page information. 

65. (Previously Presented) The system of claim 61, the first component further 
operational to obtain web page data indirectly via at least one other client of the 
distributed crawler system to provide a gateway to the second component to 
substantially reduce traffic flow to the second component. 
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66. (Canceled) 



67. (Original) The system of claim 61, the generated web page information 
comprising, at least in part, a status indication of changes to contents of at least one 
web page. 

68. (Original) The system of claim 67, the status indication including, at least in 
part, a percentage relating to an amount of change of contents of a web page. 

69. (Original) The system of claim 67, the status indication including, at least in 
part, a significance indicator to signify importance of changes in contents of a web page. 

70. (Original) The system of claim 61, at least a portion of the generated web 
page information made available for peer-to-peer client transmission via the 
communication system. 

71. (Original) The system of claim 61, the generated web page information 
compiled utilizing a randomly selected weak indicator function from a proper set of weak 
indicator functions that represent web page data compiled by a web crawler. 

72. (Original) The system of claim 61, the communication system comprising an 
internet. 
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73. (Original) The system of claim 61, the communication system comprising an 
intranet. 

74. (Original) The system of claim 61 , further comprising a storage component to 
store the web page information. 

75. (Original) The system of claim 61, further comprising a notification 
component that determines when and if the generated web page information is to be 
communicated via the communication system. 

76. (Previously Presented) The system of claim 75, the notification component 
receiving scheduling information from the second component; the scheduling 
information relating to obtaining and transmitting the generated web page information. 

77. (Canceled) 

78. (Previously Presented) The system of claim 61, the first component utilizing 
web search servers outside of the distributed web crawling system to retrieve data 
unknown to the second component. 

79. (Previously Presented) The system of claim 61 , the first component making 
the comparison data discretionarily available to the second component via the 
communication system. 
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80. (Previously Presented) The system of claim 61 , the comparison data 
including, at least in part, at least one Uniform Resource Locator (URL) of at least one 
web page. 

81. (Previously Presented) The system of claim 61, the comparison data 
including, at least in part, a hash of contents of at least one web page representative of 
a recent web site visit. 

82. (Previously Presented) The system of claim 61, the comparison data 
including, at least in part, a delta indication of contents of at least one web page. 

83. (Original) The system of claim 82, the delta indication including, at least in 
part, a hash of previous contents of a web page and a hash of recent contents of the 
web page. 

84. (Previously Presented) The system of claim 61, the second component 
comprising a server of the distributed crawling system. 

85. (Currently Amended) The system of claim 61 , the second component 
comprising a client of the distributed crawling system. 



Serial No.: 10/670,681 -.o < 

Atty Docket No.: MS1 -3984US " 1 G " *X*©t kjrj** The BUStllSS a Of IP 
Atty/Agent: Kayla D. Brant 

> % » ■sis * SU 3*M 0SS 



86. (Previously Presented) The system of claim 61, the generated web page 
information comprising data unknown to the second component. 

87. (Previously Presented) The system of claim 61, at least a portion of the 
received set of data made available for peer-to-peer client transmission via the 
communication system. 

88. (Previously Presented) The system of claim 61, the received set of data 
comprising a dictionary for data compiled by a web crawler. 

89. (Previously Presented) The system of claim 61, the received set of data 
comprising a representation of data compiled by a web crawler; the representation of 
data generated by utilizing a weak indicator function. 

90. (Previously Presented) The system of claim 61, the received set of data 
comprising a copy of data compiled by a web crawler. 

91. (Previously Presented) The system of claim 61, further comprising a storage 
component to store the set of data received from the second component. 

92. (Previously Presented) A method for facilitating data analysis, comprising: 
compiling a first data set derived from accessing web pages via a client of a 

communication system; 
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transmitting, selectively, the first data set to an entity comprising at least a server 
of a distributed crawling system that is interactive with the communication system; 

receiving a representation of a second data set compiled by the server of the 
web crawler; the second data set relating to at least one web page from the 
communication system: and 

utilizing the second data set to control which web pages to visit to compile the 
first data set. 

93. (Canceled) 

94. (Canceled) 

95. (Original) The method of claim 92, the first data set comprising, at least in 
part, a uniform resource locator (URL) for at least one web page. 

96. (Original) The method of claim 92, the first data set comprising, at least in 
part, a hash of contents of at least one web page. 

97. (Original) The method of claim 92, selectively transmitting based upon time of 

day. 

98. (Original) The method of claim 92, selectively transmitting based upon priority 
of at least one web page. 
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99. (Original) The method of claim 92, selectively transmitting based upon 
percentage of content change of at least one web page. 

100. (Original) The method of claim 92, selectively transmitting based upon 
identifying at least one new web page. 

101. (Canceled) 

102. (Previously Presented) The method of claim 92, receiving the 
representation of the second data set is accomplished via reception of a web page with 
embedded information derived from the second data set and generated by a web page 
hosting server with access to the second data set. 

103. (Previously Presented) The method of claim 92, receiving the representation 
of the second data set is accomplished via reception of a search results page with 
embedded information derived from the second data set and generated in response to a 
query transmitted to a search server having access to the second data set. 

104. (Canceled) 

105. (Previously Presented) The method of claim 92, further comprising: 
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determining when to transmit the first data set via the communication system 
based upon the second data set. 

106. (Original) The method of claim 105, the second data set containing a 
freshness indicator to indicate when its data is stale and requires updating via the first 
data set. 

107. (Original) The method of claim 105, the second data set containing a 
schedule for when the first data set is to be transmitted. 

108. (Previously Presented) The method of claim 92, further comprising: 
comparing at least a portion of the second data set with at least a portion of 

information obtained via accessing web pages to create comparison data; and 

generating a representation of the comparison data to derive the first data set. 

109. (Original) The method of claim 108, the first data set comprising data 
unknown to the second data set. 

110. (Original) The method of claim 109, the unknown data comprising only 
unknown data derived from at least one search results page from a search server 
outside of the distributed crawling system. 
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111. (Original) The method of claim 108, the first data set comprising content 
changes to web pages represented by the second data set. 

112. (Original) The method of claim 108, the first data set comprising status 
information relating to web pages represented by the second data set. 

113. (Canceled) 

114. (Previously Presented) A computer readable medium having stored thereon 
computer executable components comprising: 

a first component associated with a server of the data analysis system that 
facilitates generation of a first data set related to web page information obtained via a 
communication system; and 

a second component that coordinates a second data set relating to web page 
information from at least one distributed resource associated with at least a client of the 
server which interacts with the communication system; 

the second data set is utilized to refine the first data set, wherein refining the first 
data set comprises adding unknown information to the first data set when new 
information is received from the distributed source via the second data set and updating 
existing information in the first data set when changes have occurred in the contents of 
the web page information as indicated by the second data set. 
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115. (Original) A device employing the method of claim 37 comprising at least 
one selected from the group consisting of a computer, a server, and a handheld 
electronic device. 

116. (Original) A device employing the system of claim 1 comprising at least one 
selected from the group consisting of a computer, a server, and a handheld electronic 
device. 
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Remarks 

[0004] Applicant respectfully requests that the amendment to claim 85 shown above 
be entered. Applicant further request that sections IV and VIII of the Appeal Brief filed 
May 5, 2008 be replaced with the corresponding sections below. 

IV. Status of Amendments (37 C.F.R. §41.37(c)(1)(iv)) 

An amendment to claim 85 is submitted herewith. 

VIII. Claims Appendix (37 C.F.R. §41.37(c)(1)(viii)) 

1 . A data analysis system, comprising: 

a first component associated with a server of the data analysis system that 
facilitates generation of a first data set related to web page information obtained via a 
communication system; and 

a second component that coordinates a second data set relating to web page 
information from at least one distributed resource associated with at least a client of the 
server which interacts with the communication system; the second data set is utilized to 
refine the first data set, wherein refining the first data set comprises adding unknown 
information to the first data set when new information is received from the distributed 
source via the second data set or updating existing information in the first data set when 
changes have occurred in the contents of the web page information as indicated by the 
second data set. 
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2. The system of claim 1 , the first component comprising an internet web crawler. 

3. The system of claim 1 , the first component comprising an intranet web crawler. 

4. The system of claim 1, the second component further utilized to optimize 
reception of data from the distributed resources. 

5. The system of claim 1, the second component provides a scheduling function 
to control reception of the second data set from the at least one distributed resource. 

6. The system of claim 1, the second component utilized to facilitate 
communication traffic reduction via the communication system by employing a proper 
set of weak indicator functions representative of the first data set. 

7. The system of claim 6, the second component further utilized to randomly 
select and transmit a weak indicator function selected from the proper set of weak 
indicator functions to at least one of the distributed resources. 

8. The system of claim 1, the second component further utilized to compare the 
first data set and the second data set to detect spoof data retrieved by the first 
component. 
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9. The system of claim 1, the second component further utilized to generate 
status information about data related to the first data set; the status information 
transmitted to at least one distributed resource. 

10. The system of claim 9, the status information comprising, at least in part, a 
freshness flag to indicate freshness of information related to the first data set. 

11. The system of claim 9, the status information comprising, at least in part, a 
hash of contents of information related to the first data set. 

12. The system of claim 9, the status information comprising, at least in part, a 
copy of information of the first data set. 

13. The system of claim 1 , the communication system comprising an internet. 

14. The system of claim 1, the communication system comprising a world wide 

web. 

15. The system of claim 1 , the communication system comprising an intranet. 

16. The system of claim 15, the intranet comprising a local area network. 

17. The system of claim 15, the intranet comprising a wide area network. 
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19. The system of claim 1, the distributed resources comprising trusted entities 
interactive with the communication system and the second component. 

20. The system of claim 1 , the first data set comprising internet web page data. 

21 . The system of claim 1 , the first data set comprising intranet web page data. 

23. The system of claim 1 , the second data set comprising, at least in part, a 
hash of contents of at least one web page. 

24. The system of claim 1, the second data set comprising, at least in part, a 
Uniform Resource Locator (URL) of at least one web page. 

25. The system of claim 1 , the second data set comprising, at least in part, a time 
stamp relating to an acquisition time for information about at least one web page. 

26. The system of claim 1, the second data set comprising, at least in part, a 
delta indication of the changes to contents of the at least one web page. 

27. The system of claim 26, the delta indication including, at least in part, a hash 
of previous contents of a web page and a hash of recent contents of the web page. 
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28. The system of claim 1, the second data set comprising, at least in part, a 
status indication of changes to contents of at least one web page. 

29. The system of claim 28, the status indication including, at least in part, a 
percentage relating to an amount of change of contents of a web page. 

30. The system of claim 28, the status indication including, at least in part, a 
significance indicator to signify importance of changes in contents of a web page. 

31 . The system of claim 1 , the second data set comprising internet web page 

data. 

32. The system of claim 1, the second data set comprising intranet web page 

data. 

33. The system of claim 1 , the second data set comprising data compiled utilizing 
at least one weak indicator function randomly selected from a set of weak indicator 
functions; the set of weak indicator functions representative of the first data set. 

34. The system of claim 1, further comprising a search component to accept at 
least one search query and generate at least one search reply having at least a portion 
of the first data set represented by information embedded in the search reply. 
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35. The system of claim 1 , further comprising a web page server component to 
construct web pages having at least a portion of the first data set represented by 
information embedded in at least one link found on at least one constructed web page. 

36. The system of claim 1, further comprising a storage component to store the 
first data set. 

37. A method for facilitating data analysis, comprising: 

generating a first data set relating to a second data set obtained from web pages 
interactive with a server of a communication system; 

receiving a third data set from at least one distributed resource comprising a 
client of the server that is interactive with the communication system; the third data set 
comprising web page related information generated by the distributed resource; and 

refining the second data set to reflect information obtained from the third data 
set, by: 

adding unknown information to the second data set when new information is 
received from the distributed source via the third data set; 

updating existing information in the second data set when changes have 
occurred as indicated by the third data set; and 

passing status information to the distributed resource through one or more 
indicators after information from the third data set has been analyzed. 
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38. The method of claim 37, the first data set comprising a representation of the 
second data set. 

39. The method of claim 38, the representation of the second data set 
comprising, at least in part, a hash of contents of at least one web page contained in the 
second data set. 

40. The method of claim 38, the representation of the second data set 
comprising, at least in part, a status indication of at least one web page contained in the 
second data set. 

41. The method of claim 40, the status indication comprising a freshness flag to 
indicate if the web page information is current. 

42. The method of claim 37, the first data set comprising a copy of the second 
data set. 

43. The method of claim 37, the second data set comprising web page 
information compiled by a web crawler. 

44. The method of claim 37, the third data set comprising web page information 
based upon client accessed web page information on the communication system. 
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46. The method of claim 37, the communication system comprising an internet. 

47. The method of claim 37, the communication system comprising an intranet. 

49. The method of claim 37, further including: 

transmitting the first data set to at least one distributed resource that is interactive 
with the communication system making the first data set available to be utilized by the 
distributed resource to generate the third data set. 

50. The method of claim 38, further including: 

generating a set of weak indicator functions to represent the second data set; 
and selecting random weak indicator functions from the set of weak indicator functions 
to transmit to the distributed resources as the first data set. 

51. The method of claim 50, the set of weak indicator functions comprising a 
proper set of weak indicator functions such that a non-zero probability exists that a 
randomly selected weak indicator function can identify a new web page. 

52. The method of claim 50, generating a set of weak indicator functions 
comprising: 

providing a dictionary representative of the second data set; 

partitioning randomly the dictionary into non-overlapping subdictionaries; and 
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creating a function where l(x) = 1 if and only if at least one subdictionary's weak 
indicator function is equal to one. 

53. The method of claim 37, further including: 

comparing the third data set to the second data set to reveal spoof data included 
in the second data set. 

54. The method of claim 37, further including: 

optimizing reception of at least one third data set through scheduling of the 
distributed resources. 

55. The method of claim 37, further including: 

receiving a web page search query from at least one distributed resource; 

generating a web search results page in response to the web page search query 
from the distributed resource; 

embedding portions of the first data set in links found on the web search results 
page; and 

transmitting the web search results page as a representation of at least a portion 
of the second data set to the distributed resource. 

56. The method of claim 37, further including: 

constructing a web page utilizing at least a portion of the first data set to embed 
information about links found in the web page; and 
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transmitting the web page to disseminate the first data set to at least one 
distributed resource. 

57. A data analysis system, comprising: 

means for generating at least one first data set from a server of communication 
system; 

means for receiving and coordinating at least one second data set from at least 
one client which interacts with the server of the communication system; and 

means for refining the first data set utilizing at least one second data set, wherein 
refining the first data set comprises the at least one of adding unknown information to 
the first data set when new information is received from the client via the second data 
set and updating existing information in the first data set when changes have occurred 
in the web page as indicated by the second data set. 

58. The system of claim 57, the means for generating at least one first data set 
including a web crawler. 

59. The system of claim 58, the first data set comprising data relating to web 
pages obtained by the web crawler. 

60. The system of claim 57, the second data set comprising web page 
comparison data compiled by the at least one client and based, at least in part, upon 
representative data of the first data set. 
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61 . A data analysis system, comprising: 

a first component associated with at least one client of a distributed web crawling 
system that generates web page information from at least one visited web site for 
utilization in the distributed web crawling system; and 

a second component associated with a server that receives the web page 
information transmitted by the first component via a communication system, wherein the 
first component receives a set of data from the second component to utilize in the 
generation of the web page information comprising at least comparison data based on 
the visited web page and the received set of data. 

62. The system of claim 61 , the first component providing at least one time stamp 
relevant to a time of acquisition of data utilized in the generation of the web page 
information. 

63. The system of claim 61 , the first component receiving a set of embedded web 
crawler data from at least one search result page to utilize in the generation of the web 
page information. 

64. The system of claim 61 , the first component receiving a set of embedded web 
crawler data from at least one web page to utilize in the generation of the web page 
information. 
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65. The system of claim 61 , the first component further operational to obtain web 
page data indirectly via at least one other client of the distributed crawler system to 
provide a gateway to the second component to substantially reduce traffic flow to the 
second component. 

67. The system of claim 61, the generated web page information comprising, at 
least in part, a status indication of changes to contents of at least one web page. 

68. The system of claim 67, the status indication including, at least in part, a 
percentage relating to an amount of change of contents of a web page. 

69. The system of claim 67, the status indication including, at least in part, a 
significance indicator to signify importance of changes in contents of a web page. 

70. The system of claim 61, at least a portion of the generated web page 
information made available for peer-to-peer client transmission via the communication 
system. 

71. The system of claim 61, the generated web page information compiled 
utilizing a randomly selected weak indicator function from a proper set of weak indicator 
functions that represent web page data compiled by a web crawler. 

72. The system of claim 61, the communication system comprising an internet. 
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73. The system of claim 61 , the communication system comprising an intranet. 



74. The system of claim 61 , further comprising a storage component to store the 
web page information. 

75. The system of claim 61, further comprising a notification component that 
determines when and if the generated web page information is to be communicated via 
the communication system. 

76. The system of claim 75, the notification component receiving scheduling 
information from the second component; the scheduling information relating to obtaining 
and transmitting the generated web page information. 

78. The system of claim 61, the first component utilizing web search servers 
outside of the distributed web crawling system to retrieve data unknown to the second 
component. 

79. The system of claim 61, the first component making the comparison data 
discretionarily available to the second component via the communication system. 

80. The system of claim 61, the comparison data including, at least in part, at 
least one Uniform Resource Locator (URL) of at least one web page. 
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81. The system of claim 61, the comparison data including, at least in part, a 
hash of contents of at least one web page representative of a recent web site visit. 

82. The system of claim 61, the comparison data including, at least in part, a 
delta indication of contents of at least one web page. 

83. The system of claim 82, the delta indication including, at least in part, a hash 
of previous contents of a web page and a hash of recent contents of the web page. 

84. The system of claim 61, the second component comprising a server of the 
distributed crawling system. 

85. The system of claim 61, the second component comprising a client of the 
distributed crawling system. 

86. The system of claim 61 , the generated web page information comprising data 
unknown to the second component. 

87. The system of claim 61, at least a portion of the received set of data made 
available for peer-to-peer client transmission via the communication system. 
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88. The system of claim 61, the received set of data comprising a dictionary for 
data compiled by a web crawler. 

89. The system of claim 61 , the received set of data comprising a representation 
of data compiled by a web crawler; the representation of data generated by utilizing a 
weak indicator function. 

90. The system of claim 61, the received set of data comprising a copy of data 
compiled by a web crawler. 

91 . The system of claim 61 , further comprising a storage component to store the 
set of data received from the second component. 

92. A method for facilitating data analysis, comprising: 

compiling a first data set derived from accessing web pages via a client of a 
communication system; 

transmitting, selectively, the first data set to an entity comprising at least a server 
of a distributed crawling system that is interactive with the communication system; 

receiving a representation of a second data set compiled by the server of the 
web crawler; the second data set relating to at least one web page from the 
communication system: and 

utilizing the second data set to control which web pages to visit to compile the 
first data set. 
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95. The method of claim 92, the first data set comprising, at least in part, a 
uniform resource locator (URL) for at least one web page. 

96. The method of claim 92, the first data set comprising, at least in part, a hash 
of contents of at least one web page. 

97. The method of claim 92, selectively transmitting based upon time of day. 

98. The method of claim 92, selectively transmitting based upon priority of at 
least one web page. 

99. The method of claim 92, selectively transmitting based upon percentage of 
content change of at least one web page. 

100. The method of claim 92, selectively transmitting based upon identifying at 
least one new web page. 

102. The method of claim 92, receiving the representation of the second data set 
is accomplished via reception of a web page with embedded information derived from 
the second data set and generated by a web page hosting server with access to the 
second data set. 
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103. The method of claim 92, receiving the representation of the second data set 
is accomplished via reception of a search results page with embedded information 
derived from the second data set and generated in response to a query transmitted to a 
search server having access to the second data set. 

1 05. The method of claim 92, further comprising: 

determining when to transmit the first data set via the communication system 
based upon the second data set. 

106. The method of claim 105, the second data set containing a freshness 
indicator to indicate when its data is stale and requires updating via the first data set. 

107. The method of claim 105, the second data set containing a schedule for 
when the first data set is to be transmitted. 

1 08. The method of claim 92, further comprising: 

comparing at least a portion of the second data set with at least a portion of 
information obtained via accessing web pages to create comparison data; and 

generating a representation of the comparison data to derive the first data set. 

109. The method of claim 108, the first data set comprising data unknown to the 
second data set. 
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110. The method of claim 109, the unknown data comprising only unknown data 
derived from at least one search results page from a search server outside of the 
distributed crawling system. 

111. The method of claim 108, the first data set comprising content changes to 
web pages represented by the second data set. 

112. The method of claim 108, the first data set comprising status information 
relating to web pages represented by the second data set. 

114. A computer readable medium having stored thereon computer executable 
components comprising: 

a first component associated with a server of the data analysis system that 
facilitates generation of a first data set related to web page information obtained via a 
communication system; and 

a second component that coordinates a second data set relating to web page 
information from at least one distributed resource associated with at least a client of the 
server which interacts with the communication system; 

the second data set is utilized to refine the first data set, wherein refining the first 
data set comprises adding unknown information to the first data set when new 
information is received from the distributed source via the second data set and updating 
existing information in the first data set when changes have occurred in the contents of 
the web page information as indicated by the second data set. 
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1 15. A device employing the method of claim 37 comprising at least one selected 
from the group consisting of a computer, a server, and a handheld electronic device. 

116. A device employing the system of claim 1 comprising at least one selected 
from the group consisting of a computer, a server, and a handheld electronic device. 

Conclusion 

[0005] Please contact the undersigned representative for the Applicant if any issues 
remain that will prevent the above sections from being corrected in the Appeal Brief. 

Respectfully Submitted, 

Lee & Hayes, PLLC 
Representative for Applicant 

/Kavla D. Brant #46,576/ Dated: June 2, 2009 

Kayla D. Brant 

(kayla@leehayes.com; 509-944-4742) 
Registration No. 46576 
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